Active Learning with Scarcely Labeled Data via Bias Variance Reduction

نویسنده

  • Minoo Aminian
چکیده

In many occasions in real life, we are faced with the problem of classification of partially labeled data, or semi-supervised learning. We consider the special case of scarcely labeled data or when the labeled data is insufficient, and present a principled method which implements active learning in scarcely labeled data to enhance the performance of the learner. This method is based on the recent bias variance decomposition work for a 0-1 loss function. We focus on bias and variance reduction to reduce 0-1 loss by first selecting a random pool from the unlabeled data, and then using the mostinformative instances from that pool to reduce the variance, bias, and thereby overall loss of the learner via active learning. Our empirical results show that this technique can decrease the loss of the learner significantly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Learning with Partially Labeled Data via Bias Reduction

With active learning the learner participates in the process of selecting instances so as to speed-up convergence to the “best” model. This paper presents a principled method of instance selection based on the recent bias variance decomposition work for a 0-1 loss function. We focus on bias reduction to reduce 0-1 loss by using an approximation to the optimal Bayes classifier to calculate the b...

متن کامل

Correcting Sampling Bias in Structural Genomics through Iterative Selection of Underrepresented Targets

In this study we proposed an iterative procedure for correcting sampling bias in labeled datasets for supervised learning applications. Given a much larger and unbiased unlabeled dataset, our approach relies on training contrast classifiers to iteratively select unlabeled examples most highly underrepresented in the labeled dataset. Once labeled, these examples could greatly reduce the sampling...

متن کامل

Semi-Supervised, Dimensionality Reduction via Canonical Correlation Analysis

We analyze the multi-view regression problemwhere we have two views (X1, X2) of the input data and a real target variable Y of interest. In a semi-supervised learning setting, we consider two separate assumptions (one based on redundancy and the other based on (de)correlation) and show how, under either assumption alone, dimensionality reduction (based on CCA) could reduce the labeled sample co...

متن کامل

Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers

The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analys...

متن کامل

Incremental Active Learning in Consideration of Bias

The problem of designing input signals for optimal generalization in supervised learning is called active learning. In many active learning methods devised so far, the sampling location minimizing the variance of the learning results is selected. This implies that the bias of the learning results is assumed to be zero or small enough to be neglected. In this paper, we propose an active learning...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005